Estimating the generation interval from the incidence rate, the optimal quarantine duration and the efficiency of fast switching periodic protocols for COVID-19

The transmissibility of an infectious disease is usually quantified in terms of the reproduction number \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt representing, at a given time, the average number of secondary cases caused by an infected individual. Recent studies have enlightened the central role played by w(z), the distribution of generation times z, namely the time between successive infections in a transmission chain. In standard approaches this quantity is usually substituted by the distribution of serial intervals, which is obtained by contact tracing after measuring the time between onset of symptoms in successive cases. Unfortunately, this substitution can cause important biases in the estimate of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt. Here we present a novel method which allows us to simultaneously obtain the optimal functional form of w(z) together with the daily evolution of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt, over the course of an epidemic. The method uses, as unique information, the daily series of incidence rate and thus overcomes biases present in standard approaches. We apply our method to one year of data from COVID-19 officially reported cases in the 21 Italian regions, since the first confirmed case on February 2020. We find that w(z) has mean value \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{z}} \simeq 6$$\end{document}z¯≃6 days with a standard deviation \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma \simeq 1$$\end{document}σ≃1 day, for all Italian regions, and these values are stable even if one considers only the first 10 days of data recording. This indicates that an estimate of the most relevant transmission parameters can be already available in the early stage of a pandemic. We use this information to obtain the optimal quarantine duration and to demonstrate that, in the case of COVID-19, post-lockdown mitigation policies, such as fast periodic switching and/or alternating quarantine, can be very efficient.

(1) which for φ = φ * is decorrelated from the daily number of tests n T (m). We have found that the generation time distribution w(z) obtained from the maximization of LL (φ) = LL {I (φ) (φ)}, {R c }, {µ}, z, σ, V is very similar to the one obtained from LL ({I}, {R c }, {µ}, z, σ, V ). To support our result of a very peaked distribution w(z) we here presents a numerical check which is substantially based on the inversion of Eq.(1). Our approach consists in generating the numerical time series {I (φ) } implementing in our numerical procedure (see Methods in the main text) the values of {R c } and {µ} and a corresponding to the maximum of LL (φ) , for τ = 0.15. We have only one free parameter which is the initial number of infected individuals and setting this number to I (φ) (0) = 250 we find that the numerically simulated I (φ) (m) (open magenta diamonds in Fig.Supp.1a) is in very good agreement with the recorded one (green curve in Fig.Supp.1a). We next combine the numerical {I (φ) } with the experimental value of {n T } to simulate the total number of detected infected individuals I(m) = I (φ) (m) 1 + n T (m) φNp , from Eq.(1). We find a very good agreement between the simulated I(m) (filled magenta triangles in Fig.(Supp.1a)) and the recorded one (black continuous line in Fig.(Supp.1a)). Interestingly we find that this agreement is even better than the one obtained in Fig.1 of the main text, with the numerically simulated signal which fits the fluctuations of the recorded one. Furthermore, we also find that the log-likelihood estimated for the numerical series, plotted in Fig.Supp.2a presents a dependence on a and τ very similar to the one observed in Fig.3a of the main text for the experimental one. These results support our decomposition between {I} which depends on {n T } and {I (φ) } which conversely is independent of the number of tests.
We next consider numerical simulations still considering a mean value z = 6.2 days but with a larger σ = 4.2 days. We find that for any choice of I (φ) (0), implemented in the generation tree algorithm, it was never possible to generate a numerical sequence {I (φ) } in good agreement with the experimental one. The best numerical I , a, τ, V ) however exhibits a functional dependence on a and τ very different from the experimental one. A maximum at z 6 is still observed at small τ , but it is not the dominant one. This difference suggests that the behavior of LL observed in instrumental data is difficult to be explained by Eq.(1), unless one does not implement a sufficiently small value of σ in w(z). This further supports the hypothesis that the true probability distribution of generation times w(z) is a very peaked distribution.

REMOVAL OF THE WEEKLY PERIODICITY BY FOURIER FILTERING
We define I F the signal of the daily incidence of COVID-19 in Lombardy I(m) filtered after the application of a Butterworth filter in the frequency range [0, 1/7.5[∪]1/6.5, 1/365]days −1 . The corresponding log-likelihood LL {I F }, {R c }, {µ}, z, σ, V is plotted in Fig.Supp.3 as a function of z = aτ , for different values of τ , which correspond to different values of σ = τ z. Results show that the dominant peak is observed for τ = 0.1 and z = 6 in agreement with the results for the log-likelihood of the disentangled signal LL (φ) (Fig.3b of the main text). In Fig.3a of the main text we present the dependence of LL ({I}, {R c }, {µ}, τ , σ, V ), for the optimal series {R c } and {µ} and different choices of the parameter a and τ of the Gamma distributed w(z). In Fig. 3b of the main text the same analysis was presented for LL {I (φ) }, {R c }, {µ}, τ , σ, V . In this section we present exactly the same study but we assume that w(z) obeys either a log-normal (w ). The dependence of both functions on two parameters µ, λ can be completely reformulated in terms of the average value z and standard deviation. In particular one has z = e µ+λ 2 /2 and σ = z e λ 2 − 1 for the log-normal distribution, whereas z = λΓ(1 + 1/µ) and standard deviation σ = λ Γ(1 + 2/µ) − Γ(1 + 1/µ) 2 for the Weibull distribution. Also in this case we can conclude that no significant differences can be observed between the Gamma and the log-normal distribution with LL {I (φ) }, {R c }, {µ}, τ , σ, V presenting a very similar dependence on z when the same value of σ is imposed in both distribution. Concerning the comparison with the Weibull distribution, we again find a smaller but comparable value of maximum of LL {I (φ) }, {R c }, {µ}, τ , σ, V with optimal values for z and σ very similar to those found for the Gamma and the log-normal distribution. We can therefore conclude that there is no important difference between a Gamma and a log-normal or a Weibull distributed w(z) which all lead to very similar optimal values for z and σ.

THE ROLE OF DIFFERENT LEARNING PERIODS
In this section we present the result of our procedure considering a learning period m ∈ [1, m learning ]. More precisely, we assume a Gamma distributed w(z) and for different choices of the parameters z and σ we find the optimal series {R c } m=1,.,m learning and {µ} m=1,.,m learning which maximize the log-likelihood LL ( The estimate of the optimal values of z ∈ [5.8, 6] and σ ∈ [0.8, 0.95] obtained in the learning period can be used to estimate R c (m) and µ(m) in the following testing period m > m learning . This information can be further used to generate synthetic sequences according to our numerical procedure (see Methods in the main text). Results of this procedure presented in Fig.Supp.7 show that, already for m learning = 10 days, the simulated {I(m)} is in very good agreement with the experimental one. In particular, the simulated {I(m)} is able to capture also the daily fluctuations of the experimental incidence rate. This result indicates that these fluctuations do not have only a stochastic origin but also reflect genuine changes of the transmission rate adequately captured by R c (m). This demonstrates that our analysis is not affected by over-fitting. As further support, in Fig.Supp.7 we also show that implementing an erroneously estimated value of σ, during the learning period, leads to a much worst agreement during the testing period.

THE DAILY INCIDENCE OF PATIENTS IN INTENSIVE CARE UNITS
In this section we apply our procedure considering for the incidence rate the daily number of individuals entering the intensive care units I ICU . This information is available in Italy since December 2020 and we use one year of data up to December 2020. Since the daily value of I ICU for a single region is usually small, we perform our analysis considering the I ICU for the whole italian territory. The time evolution of I ICU (m) is plotted in the left panel of

THE OTHER ITALIAN REGIONS
In this section we present the same analysis performed in the main text for the region Lombardy for other five regions with the largest number of inhabitants, i.e. Lazio, Campania, Sicily, Veneto and Emilia-Romagna. The five regions are distributed over the italian territory, from North to South, and are therefore representative of the entire country. For each of the five regions we present a figure with four panels. In panel (a) we plot both the daily incidence rate I(m) and the disentangled incidence rate I (φ) (m) as function of m. In panel (b) we plot the best estimate of the reproduction number R c (m) and of µ(m) for the parameter of w(z) leading to a maximum in LL ({I}, {R c }, {µ}, z, σ, V ). In panel (c), we plot the LL ({I}, {R c }, {µ}, z, σ, V ) for a Gamma distributed w(z) ad function of z for different τ , or equivalently, different σ values. Finally in panel (d) we plot LL {I (φ) }, {R c }, {µ}, z, σ, V versus z for the disentangled incidence rate {I (φ) } evaluated at φ = φ * . All results are obtained implementing a Gamma distribution for w(z) but we find similar results using a Weibull distribution. Very similar results are found for the other 15 Italian regions.
The different figures confirm the same result of the main text indicating a very peaked w(z) with average value z presenting small fluctuations from region to region. A slightly different pattern is only obtained for LL {I (φ) , {R c }, {µ}, z, σ, V for the Lazio and the Sicily regions, where the maximum at z 6 days is still present but it is not the dominant one. Indeed, in Lazio the dominant maximum is at z 10 days whereas it is at z 3 days in Sicily. We do not have a clear explanation for this result but we could attribute the origin of the other maxima to some spurious periodicity introduced in the passage from {I} to {I (φ) }.
A. Lazio